Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks
نویسندگان
چکیده
This paper presents an in-depth analysis of Intel’s Haswell microarchitecture for streaming loop kernels. Among the new features examined is the dual-ring Uncore design, Cluster-on-Die mode, Uncore Frequency Scaling, core improvements as new and improved execution units, as well as improvements throughout the memory hierarchy. The Execution-Cache-Memory diagnostic performance model is used together with a generic set of microbenchmarks to quantify the efficiency of the microarchitecture. The set of microbenchmarks is chosen such that it can serve as a blueprint for other streaming loop kernels.
منابع مشابه
An Evaluation of Intel's Restricted Transactional Memory for CPAs
With the release of their latest processor microarchitecture, codenamed Haswell, Intel added new Transactional Synchronization Extensions (TSX) to their processors’ instruction set. These extensions include support for Restricted Transactional Memory (RTM), a programming model in which arbitrary sized units of memory can be read and written in an atomic manner. This paper describes the low-leve...
متن کاملAn Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors
This paper presents a survey of architectural features among four generations of Intel server processors (Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) with a focus on performance with floating point workloads. Starting on the core level and going down the memory hierarchy we cover instruction throughput for floating-point instructions, L1 cache, address generation capabilities, core clock ...
متن کاملExecution-Cache-Memory Performance Model: Introduction and Validation
This report serves two purposes: To introduce and validate the Execution-Cache-Memory (ECM) performance model and to provide a thorough analysis of current Intel processor architectures with a special emphasis on Intel Xeon HaswellEP. The ECM model is a simple analytical performance model which focuses on basic architectural resources. The architectural analysis and model predictions are showca...
متن کاملA Performance Evaluation of the Nehalem Quad-Core Processor for Scientific Computing
In this work we present an initial performance evaluation of Intel's latest, secondgeneration quad-core processor, Nehalem, and provide a comparison to first-generation AMD and Intel quad-core processors Barcelona and Tigerton. Nehalem is the first Intel processor to implement a NUMA architecture incorporating QuickPath Interconnect for interconnecting processors within a node, and the first to...
متن کاملKummer Strikes Back: New DH Speed Records
This paper introduces high-security constant-time variable-base-point Diffie–Hellman software using just 274593 Cortex-A8 cycles, 91460 Sandy Bridge cycles, 90896 Ivy Bridge cycles, or 72220 Haswell cycles. The only higher speed appearing in the literature for any of these platforms is a claim of 60000 Haswell cycles for unpublished software performing arithmetic on a binary elliptic curve. The...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016